August 22, 2018

Outline

What we'll cover today

  • Intro & general tips
  • R performance tips: patterns to use and avoid
  • Vectors & Matrices
  • Memory management, large tables
  • Profiling and Benchmarking
  • Loops

General advice

  • If you don't understand something, try some experiments
  • Browse the documentation, learn its jargon and quirks
  • Break your code into functions when appropriate
  • Use functions to reduce the need for global variables
  • Write tests for your functions
  • Use git to keep track of changes
  • Learn to distribute your code as a package

Is R slow?

Is R slow?

Sometimes, but well written R programs are usually fast enough.

  • Designed to make programming easier
    • Speed was not the primary design criteria
  • Slow programs often a result of bad programming practices or not understanding how R works
  • There are various options for calling C or C++ functions from R

R performance before you start

Premature optimization is the root of all evil – Donald Knuth

  • Become familiar with R's vector and apply functions
  • Consider specialized performance packages
  • E.g. data.table, bigmemory, plyr, RSQLite, snow, multicore, parallel
  • Don't use an R GUI when performance is important

R tuning advice

  • Be methodical but don't get carried away with micro-optimizations
  • Use monitoring tools such as top, Activity Monitor, Task Manage
  • Use vector functions
  • Avoid duplication of objects
  • Pre-allocate result vectors
  • Profile your code and run benchmarks
  • Byte-compile with cmpfun, or call a compiled language (e.g. C, C++)

Vectors & Matrices

Vectors are central to good R programming

  • Fast, since implemented as a single C or Fortran function
  • Concise and easy to read
  • Can often replace for loops
  • However, heavy use can result in high memory usage

Useful vector functions

  • math operators: +, -, *, /, ^, %/%, %%
  • math functions: abs, sqrt, exp, log, log10, cos, sin, tan, sum, prod
  • logical operators: &, |, !
  • relational operators: ==, !=, <, >, <=, >=
  • string functions: nchar, tolower, toupper, grep, sub, gsub, strsplit
  • conditional function: ifelse (pure R code)
  • misc: which, which.min, which.max, pmax, pmin, is.na, any, all, rnorm, runif, sprintf, rev, paste, as.integer, as.character

Dynamic features of vectors

initialize x and fill it with zeros

n <- 10
x <- double(n)
x
 [1] 0 0 0 0 0 0 0 0 0 0

Extend x by assignment

x[15] <- 100
x
 [1]   0   0   0   0   0   0   0   0   0   0  NA  NA  NA  NA 100

Dynamic features of vectors

Resize/truncate x

length(x) <- 5
x
[1] 0 0 0 0 0

rnorm vector function

x <- rnorm(10)
x
 [1] -2.18729173 -1.40101348  1.92523250 -0.37852715 -2.00779650
 [6]  0.63690109 -0.09267038 -0.62119000 -1.95511252  0.57848248

Vector indexing

Extract subvector

x[3:6]
[1]  1.9252325 -0.3785271 -2.0077965  0.6369011

Extract elements using result of vector relational operation

x[x > 0]
[1] 1.9252325 0.6369011 0.5784825

Vector indexing

You can also use an index to assign values

x[is.na(x)] <- 0

Matrix indexing

Make a new matrix

m <- matrix(rnorm(100), 10, 10)

Extract 2X3 submatrix (non-consecutive columns)

m[3:4, c(5,7,9)]
            [,1]       [,2]       [,3]
[1,] -0.01977612 0.28915255  0.1152080
[2,]  0.10563771 0.05858727 -0.1579061

Matrix indexing

Extract arbitrary elements as vector

m[cbind(3:6, c(2,4,6,9))]
[1] -0.7105643 -1.8536242 -0.7136148  0.6650354

Extract elements using result of vector relational operation

head(m[m > 0])
[1] 0.5001715 1.8025833 0.8039774 0.4841908 1.2018386 0.4711964

Matrix indexing

You can also use a matrix index to assign values

m[is.na(m)] <- 0

Memory Considerations

Memory in R

  • Avoid duplicating objects, especially big ones or those in loops
  • Look into memory effecient libraries
  • Look into other formats to store data

Beware of object duplication


  • R uses pass by value semantics for function arguments
  • In general, this requires making copies of objects
    • Functions must return modified object
  • R tries to avoid copying unless necessary

Example of object duplication

tracemem reports when an object is duplicated, which is very useful for debugging performance problems.

In this example, object duplication is expected and helpful.

x <- double(10)
tracemem(x)
[1] "<0x55c9d3446ac0>"
y <- x
y[1] <- 10
tracemem[0x55c9d3446ac0 -> 0x55c9d3431200]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> 

Example of object duplication

.Internal(inspect(x))
@55c9d3446ac0 14 REALSXP g0c5 [NAM(2),TR] (len=10, tl=0) 0,0,0,0,0,...
.Internal(inspect(y))
@55c9d3431200 14 REALSXP g0c5 [NAM(1),TR] (len=10, tl=0) 10,0,0,0,0,...
x[1] <- 50
tracemem[0x55c9d3446ac0 -> 0x55c9d338e978]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> 

Example of unexpected object duplication

Passing matrix to non-primitive function such as nrow poisons for duplication

> m <- matrix(0, 3, 3)
> tracemem(m)
[1] "<0x7fc168d29df0>"
> m[1,1] <- 1
> nrow(m)
[1] 3
> m[1,1] <- 2
tracemem[0x7fc168d29df0 -> 0x7fc168d21f58]:

Splitting problem into smaller tasks

R makes it easy to read entire data sets in one operation, but reading it in parts can be much more efficient.

  • Splitting the problem into smaller tasks is compatible with parallel computing techniques
  • The foreach & iterators packages provide tools to split inputs into smaller pieces
  • Use Linux commands (split, awk, etc) or other languages (e.g. Python to preprocess
    • Split data files and remove unneeded fields

Beware of read.table

The read.table function is commonly used for reading data files, but it can be very slow on large files

  • Use of the colClasses argument can improve performance
  • colClasses can be used to skip a column, using less memory
  • It can be faster to read a file in smaller chunks using the nrows argument
  • The scan function can be faster
  • Consider using similar functions from different packages, such as data.table, sqldf, and bigmemory

Good alternative libraries

bigmemory

  • Defines mutable matrix objects that aren't automatically duplicated
  • works very well in conjunction with parallel computing ### big.matrix
  • can use a backing file that is memory mapped ### biganalytics
  • apply, biglm, bigglm, bigkmeans, colmax ### bigtabulate
  • bigsplit, bigtabulate, bigtable, bigtsummary ### synchronicity
  • boost.mutex, lock, unlock

Save data in binary format}

Saving data in a binary format can make it much faster to read the data later. There are a variety of functions available to do that:

  • save/load
  • writeBin/readBin
  • write.big.matrix/read.big.matrix (from bigmemory)

SQLite in R

Consider putting data into an SQLite database.

  • RSQLite package is easy to use
  • Easy to get subsets of the data into a data frame
  • Command line tool very useful for experimenting with queries
  • Database can be accessed from many different languages
  • The sqldf package may be useful also
  • Can be quite slow

Profiling and Benchmarking

Find your marble, then start chiseling

Profiling vs Benchmarking

Profiling

  • If you've decided you need your code to perform better, profile first
  • Profiling helps isolate hot spots
  • Time spent here will likely yeild best ROI

Benchmarking

  • With hot spots in hand, examine the code and propose alternatives
  • While ensuring the reesults are the same, ask which performs best

Profiling

R profiling tools

R has builtin support for profiling, but there are additional

packages available:

  • proftools
  • profvis (RStudio integration)

Basic profiling with proftools

f <- function(a) { g1(a) + g2(2 * a) }

g1 <- function(a) { h(a) }

g2 <- function(a) { sqrt(a) }

h <- function(a) {
  b <- double(length(a))
  for (i in seq_along(a)) {
    b[i] <- sqrt(a[i])
  }
  b
}

Basic profiling with proftools

x <- 1:1000000
Rprof('prof.out')
for (i in 1:10) {
  y <- f(x)
}
Rprof(NULL)
summaryRprof("prof.out")$by.self
         self.time self.pct total.time total.pct
"h"           0.84    56.76       0.90     60.81
"g2"          0.46    31.08       0.56     37.84
"double"      0.06     4.05       0.06      4.05
"sqrt"        0.06     4.05       0.06      4.05
"*"           0.04     2.70       0.04      2.70
"f"           0.02     1.35       1.48    100.00

Basic profiling with profvis

Can also do this in RStudio, e.g. Profile -> Start Profile

profvis({
for (i in 1:10) {
  y <- f(x)
}
})

Benchmarking

Knowing where code is slow via profiling, use benchmarking tools

  • Put problem code into a functions
  • Benchmark different versions of code for comparison
  • system.time is useful for long running code
  • microbenchmark package is useful for analyzing short running code

Loops

Wherefore art thou performant or not?

Are for loops in R slow?

  • Not all for loops are bad
  • Most common mistakes involve for loops.
  • The classic mistake is not preallocating a result vector.

Example 1

Create a vector of length n where all values are x

A bad for loop

bad.for <- function(n,x) {
  result <- NULL
  for (i in 1:n) {
    result[i] <- x
  }
  result
}
  • Large number of iterations
  • Tiny amount of computation per iteration
  • Item result vector is reallocated and copied on each iteration
  • Triggering garbage collection periodically

A better for loop

okay.for <- function(n,x) {
  result <- double(n)
  for (i in 1:n) {
    result[i] <- x
  }
  result
}

Improvement over the previous example, but it's still slow because of the many tiny iterations.

A puzzle loop

strange.for <- function(n, x) {
  result <- NULL
  for (i in n:1) {
    result[i] <- x
  }
  result
}

Is this loop faster or slower than the previous two?

Using a vector function

# use of vector assignment
vector.assn <- function(n, x) {
  result <- double(n)
  result[] <- x
  result
}

We can also use vector assignment

Using R built-in function

Loop testing

Make sure they produce identical output

n <- 10000
x <- 7
bad.result        <- bad.for(n, x)
okay.result       <- okay.for(n, x)
strange.result    <- strange.for(n, x)
vector.result     <- vector.assn(n, x)
built.result      <- built.in(n, x)
c(identical(bad.result, okay.result),
identical(bad.result, strange.result),
identical(bad.result, vector.result),
identical(bad.result, built.result))
[1] TRUE TRUE TRUE TRUE

Loops benchmark results

res <- microbenchmark(bad=bad.for(n,x), okay=okay.for(n,x), strange=strange.for(n,x),
                      vector=vector.assn(n,x), builtin=built.in(n,x))
kable(summary(res, unit="relative"))
expr min lq mean median uq max neval
bad 186.013528 176.178269 84.403879 160.015702 136.61302 6.1794773 100
okay 38.712396 35.938382 16.487535 32.437231 27.92712 0.6577853 100
strange 36.655978 35.103010 16.030565 30.666618 26.66504 0.6827870 100
vector 1.721916 1.656932 1.454867 1.515154 1.39575 1.4773681 100
builtin 1.000000 1.000000 1.000000 1.000000 1.00000 1.0000000 100

Loops benchmark plot

autoplot(res)

Example 2

Create a matrix with n rows and x columns

Each value in the matrix is sampled from normal distribution, \(\mu=0 , \sigma=1\)

Another bad for loop

Generate a matrix of values sampled from normal distribution n rows, x columns

bad.norm <- function(n,x) {
  m <- NULL
  for (i in 1:n) {
    m <- rbind(m, rnorm(x))
  }
  m
}

Preallocation of result vector

Just like before, we build a matrix and populate it with a for loop

ok.norm <- function(n,x) {
  m <- matrix(0, nrow=n, ncol=x)
  for (i in 1:n) {
    m[i,] <- rnorm(100)
  }
  m
}

Use lapply and rbind (don't need to preallocate matrix)

lapply.norm <- function(n,x) {
  do.call('rbind', lapply(1:n, function(i) rnorm(x)))
}

Compute all rows at once

best.norm <- function(n,x) {
  m <- rnorm(x * n)
  dim(m) <- c(x, n)
  t(m)
}

Generate test data

n <- 600
x <- 100
# Verify correct results
set.seed(123); bad.result <- bad.norm(n,x)
set.seed(123); ok.result <- ok.norm(n,x)
set.seed(123); lapply.result <- lapply.norm(n,x)
set.seed(123); best.result <- best.norm(n,x)

c(identical(bad.result, ok.result),
identical(bad.result, lapply.result),
identical(bad.result, best.result))
[1] TRUE TRUE TRUE

Run benchmarks

res <- microbenchmark(bad=bad.norm(n,x), ok=ok.norm(n,x),
                        lapply=lapply.norm(n,x), best=best.norm(n,x))
kable(summary(res, unit="relative"))
expr min lq mean median uq max neval
bad 11.373141 10.075068 9.479207 10.554045 10.337819 2.1393675 100
ok 1.273609 1.301870 1.203953 1.313377 1.405921 0.1798499 100
lapply 1.328560 1.341476 1.208746 1.338217 1.302755 0.1869873 100
best 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000 100

Matrix generation benchmark plot

autoplot(res)

Just in Time (JIT) compiling your functions

Results in your function as bytecode

fun.for <- function(x, seed=1423) {
  set.seed(seed)
  y <- double(length(x))
  for (i in seq_along(x)) {
    y[i] <- rnorm(1) * x[i]
  }
  y
}
fun.for.compiled <- cmpfun(fun.for)

Benchmarking JIT

x <- 10000
res <- microbenchmark(fun.for=fun.for(x),
                      fun.for.compiled=fun.for.compiled(x))
kable(summary(res, unit="relative"))
expr min lq mean median uq max neval
fun.for 1.0000000 1.00000 1.0000000 1.000000 1.000000 1.0000000 100
fun.for.compiled 0.9845008 1.00029 0.1338403 1.002452 1.013011 0.0049561 100

Benchmarking JIT

autoplot(res)

Simple parallel computing

Simple parallel computing using mclapply

The standard parallel package includes a very useful function: mclapply

  • mclapply is nearly a drop-in replacement for lapply
  • mc.cores argument specifies number of workers to use
  • does not execute in parallel on Windows (depends on fork system call)
  • not generally safe to use in R GUIs (seems to work in RStudio)

Parallel randomForest example

library(parallel)
library(randomForest)

x <- matrix(runif(500), 100)
y <- gl(2, 50)
ntree <- 1000
cores <- detectCores()
vntree <- rep(ntree %/% cores, cores)
worker <- function(n) randomForest(x, y, ntree=n)
rf <- do.call('combine',
              mclapply(vntree, worker, mc.cores=cores))

Resources